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Abstract 

We present a protocol for verification of "no such entry" replies from 

databases. We introduce a new cryptographic primitive as the underlying 

structure, the keyed hash tree, which is an extension of Merkle's hash 

tree. We compare our scheme to Buldas et al.'s Undeniable Attesters and 

Micali et al.'s Zero Knowledge Sets. 

C/3 ! 

^ , In the following, the term database refers to a system supplying the simplest 

form of databases, a table of {key, value) pairs with functions for keyed insertion 

_^^ . and retrieval. 

> 
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i/-^ ; 1 Motivation: Untrusted Databases 

O 

\C I There are situations where the users of a database system cannot trust the 

Cp , database. For example, the database might be outside the users' security 

perimeter, or might not follow the same security policies as the users. The users 
must therefore assume that the database could be taken over by an attacker. 

The entities writing the database entries want to be sure that the database 
delivers their entries on requests unchanged. The database maintainer wants to 
be able to defend itself against accusations of misbehavior. 
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c^ ■ 2 Threat scenario 



How can an attacker or malicious database maintainer deceptively influence the 
database's replies? The following list explains the attacks: 

1. Changing existing entries: The attacker can change the value of an 
existing entry. 

2. Creating false entries: The attacker can create {key, value) pairs him- 
self. 

3. Re— labeling existing entries: The attacker can insert existing values 
under different keys. 



4. Returning old entries: If the value of a {key, valuei) pair is overwritten 
by a new value2, the attacker can return the old valuei on request for key. 

5. Denying existing entries: On a search request, the attacker can make 
the database deny that a value exists for the given key in the table. 

This is not only of theoretical interest. Real-life examples are web-hosting 
solutions, where the changing parts of all web pages are kept in a central rela- 
tional database and the pages are constructed from templates. Another exam- 
ple is the Internet's Domain Name System (DNS), which is basically a partially 
distributed lookup-table. Attacks on DNS as listed above are described in 
(|l|), remedies are suggested in the DNS Security Extensions(|2) and a draft by 
Bellovin i^. 

2.1 Detecting Attacks 

The readers should be able to detect if the database is lying about entries. The 
database should be able to prove the correctness of its answers as long as the 
writers are honest. For most of the listed attacks, this can easily be achieved if 
public key cryptography is employed. It is not necessary to assume the existence 
of a global Public Key Infrastructure for this purpose. It is sufficient if the read- 
ers of the database have access to all the public keys of the writers (presumably 
a smaller set than the readers). The keys must be stored outside the database. 
With this information at hand, and routines for signature creation (Sk) and 
verification (Vfe), readers and writers can make attacks ^-[31 detectable under 
standard cryptographic assumptions. Here is how: 

1. Changing existing entries: Writers sign the values of their entries and 
include the signature in the value of the table. Readers check the signa- 
tures and detect changes by the database. 

{key, value) = (fcey, data||S'fe(data)) 

2. Creating false entries: The same applies here. Since the database has 
no access to the writer's secret keys, it cannot produce deceptive entries. 

3. Re— labeling existing entries: This calls for an extension: the writers 
now include the key in the data they sign: 

{key, value) = (fcey, data||S'fe(fce2/||data)) 

The readers can now check if the entry belonged to the key they gave in 
their search request. 

Points 21 and [3 however, pose a harder problem. If there is no information 
about existing entries anywhere outside the database, then these attacks cannot 
be detected. 

As for point ^ from the algebraic properties of standard digital signatures it 
follows that a signature on an old entry will still be valid after the entry has been 



replaced. There are signature schemes where the signer's cooperation is neces- 
sary for verification, called undeniable signatures. The concept was presented 
first by Chaum ((j|) in 1989. Such a scheme in our scenario would introduce 
much more communication than in the standard writer-database-reader setup. 
A writer who substitutes an existing entry would have to reliably notify the 
writer of the previous version, so that the previous author retracts the signa- 
ture (e.g. by sending a new "signature outdated" message in the verification 
protocol). Readers would need to communicate with writers who might not be 
available every time someone wants to validate a reply. We would like to avoid 
such a complicated and error-prone setup. 

The problem with point is that from the perspective of the database, it 
would mean proving that it does not know the table entry, an impossible feat 
in this scenario. Since there is no value returned, there can be no signature, 
unless the writers supply a special "no entry" value tied to every possible and 
unused key. 

There are ways around this, however, if we allow the writers and readers 
access to another service besides the database. The service is required to supply 
a single value on request; authenticated writers must be able to overwrite that 
value. We will call this service the announcement service; 

The writers store a signed, condensed description of the database's state. 
This value must be updated after every write to the database. The readers 
can use this description to validate the database's replies. We will call this 
summary of database state a state credential. The validation protocol consists 
of two parts: 

• A writer builds and publishes a new state credential from the previous 
one. 

• Readers check the validity of a database reply. This requires interaction 
with the announcement service and the database. 

3 State Credentials for Databases 

Our requirements for the state credential (to be short and uniquely bound to 
the database state) suggest the application of a cryptographic hash function. 
We will now describe several possibilities to build state credentials with hash 
functions. 

3.1 Simple Hash 

This is the simplest form of a state credential. The writer queries the database 
for all entries and inserts the key of his/her new entry. The writer then sorts all 
keys in a pre-defined order and computes the hash value over all of them. This 
value is signed and supplied through the announcement service. Formally: 
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where n is the number of entries in the database, ki is the «th key in the particular 
order and addition denotes string concatenation after adding a special "end of 
key" marker to the parameters. The simple hash is not a solution, because to 
check a database reply against this, a reader would have to download the whole 
database as well. 

3.2 Hash Chain 

Application of hash chains make the computation of the credential cheaper. The 
writer pulls the latest credential from the announcement service and "adds" the 
new entry's key. Formally: 

Co = hQ 

Ci = h{ci^i + ki) 

where q is the credential after adding the ith entry to the database and /i() is 
the hash of the empty string. Addition is defined as above. 

To validate a database reply, a reader would have to pull all keys up to the 
requested key in their order of addition. In the interesting case of a non-existing 
key, the reader has to pull the whole database. So this is still not satisfactory. 

3.3 Hash Trees 

Merkle's (jsj) hash trees allow fast (Log-Time) verification of digests over many 
entries. Hashes over pairs of entries (the keys in our scenario) are computed 
and the resulting hashes are again paired and new hashes computed and so on. 
The number of entries has to be a power of 2. Formally: 

H{l,j)^h(k,) 

Hi^,J) = HHiz + 1, [ii±l:lll]).i/(^ + 1, \'-±^])) 

where I = log2 (gentries), by "." we denote the concatenation of strings and kj 
is the jth key in an arbitrary order {H{1, 1) is the root of the tree). 

The state credential at the announcement service must be updated for each 
new entry and a hash tree root can only be computed if the number of entries 
is a power of two. So we must find a form of state credential that allows instant 
updating but still can use hash trees. 

The solution is to split the number of entries into powers of 2 and to generate 
hash trees of appropriate height for all of those powers. For this reason, the state 
credential will consist of a number of hash tree roots, at most log2(n — 1) where 
n is the number of entries, together with the height of the trees. When an entry 
is added to the non-empty database, and if the number of entries is odd after 
the addition, then a new hash tree of height zero (simply the hash of the key) 
is appended to the state credential. If the number is even, then the two most 
recent tree roots are combined (hashed together) to form the root of a higher 



tree. This is done recursively from the end. Note that we must start at the 
zeroth entry, since even an empty database needs a state credential. 
So if the bit representation is 

Llog2(n-l)J 

n - 1 = Y^ h-2' 

4=0 

and t — "^bi (the Hamming-weight of n — 1), then the credential consists of 
the roots of t hash trees of height i for every bi ~ 1. 

The database in turn keeps a counter of entries. It can thus deduce in which 
of the trees in the current state credential an entry resides. This is achieved by 
comparing the entry's number against the current set of trees and their heights. 

If a reader wishes to validate the databases' reply for a certain key k, the 
database has to supply a path in the tree leading to the leaf with k, as well 
as all pairs of hashes along that path. The reader can now verify that each 
pair's concatenation hashes to the corresponding hash in the next pair, up to 
the root of the tree in the state credential. Communication with the database 
for validation is bounded at worst by log2(n) • 2 • \h()\ bytes, where \h{)\ is the 
byte-length of the hash's output. 

The drawback of this scheme is that validation of a negative reply ( "no such 
entry" ) still requires downloading the whole database. 

4 Keyed Hash Trees 

Our contribution is the introduction of keyed hash trees. In these, the position 
of a leaf in the tree is dependent on the key corresponding to the leaf. Our 
construction uses a tree of height |/i()|, where ft- is a cryptographic hash function. 
For every {key, value) in the database, a leaf is inserted in the tree, where the 
path to the leaf from the root is defined by h{key) and the data in the leaf is 
hivalue). Leaves without a corresponding entry are implicitly set to the hash 
of the empty string h{). 

We call a sub-tree empty if there is no path leading through the sub-tree's 
root that leads to a leaf with a non-empty value. All leaves of an empty sub- 
tree have the value h{). The nodes in the layer above the leaves all have value 
h{h{),h{)), and so on. Note that all nodes in a layer have the same value, 
which in turn is derived from the hash of the empty word. It is easy to pre- 
compute these values. It allows identification of a sub-tree as being empty by 
a single table lookup, since all empty sub-trees of equal height have the same 
pre-computed root hash. 

From the collision resistance for cryptographic hash functions, it follows that 

Va, b : a ^ b : h{a.b) =/= h{b.a). 

The value of the tree's root thus depends on all the leaves' values and all the 
paths. In contrast to the schemes above, non-existing entries do have leaves 
and paths and can thus be validated. 



Storing and working on a binary tree of height 160 (for h = SHAl for 
example) seems daunting at first, since there would be 2^^°+^ — 1 entries, only 
a few orders of magnitude less then the number of hydrogen atoms in the whole 
universe. But all except n of the leaves are empty, where n is the number of 
entries in the database. We will show in Section [S] that it is sufficient to store 
only the non-empty leaves and the branches leading to them from the root. 

Before inserting a new entry {key, value) in the database, a writer requests 
all pairs of hashes on the path to the new entry. The writer then validates the 
hashes against the current state credential, which is the root of the keyed hash 
tree. The writer substitutes the hash in the respective leaf by the hash of the 
value, and computes a new root hash. He then signs this root hash and puts it 
on the announcement service after inserting the new entry in the database. 

To validate a database reply, the reader requests all pairs of hashes on the 
path derived from the key. For positive replies, the reader hashes over the 
returned value, compares that with the hash in the corresponding leaf and pro- 
ceeds as with a standard hash tree. 

If the reply was negative ( "no such entry" ) , then the path must lead to an 
empty leaf and — as in the hash tree scheme above — the hash values can be 
checked recursively up to the root. The root must be the same as the signed 
value received from the announcement service. 



5 Sparse Hash Tree Algorithms 

Our database should be able to respond quickly to requests for pairs of hashes 
along a given path. To do this, we need an internal representation of the keyed 
hash tree corresponding to the current database table. 

We are helped by the keyed hash tree's property that all empty sub-trees of 
equal height are identical (an algorithm for creating a list of all empty sub-trees' 
roots is given in figure ^|. 

Instead of storing 2^^^ — 1 nodes, we need to store only those nodes that are 
part of a path to an existing entry in the database, i.e. the hash of at least one 
key in the database describes a path leading through the node. To reduce the 
space further, we store only one node for every sub-tree that contains exactly 
one entry, i.e. the first node from the root down which lies on one single path is 
used to describe the whole sub-tree containing the corresponding leaf. We call 
the resulting data structure a sparse hash tree. 

The main problem is that searching in the tree is done from the root down 
to the leaf, while the hash values in the nodes are computed from the leaves up. 

Each stored node in our hash tree representation is a struct defined in 
listing 

The b [2] array contains pointers to the left and right children, or NULL 
if there is no respective child, flag indicates whether this node represents a 
BRANCH or a LEAF, hash is the hash of the tree rooted in this node. Only in leaf 
nodes is e not NULL and contains a pointer to a struct entry defined in listing 
121 Note that leaf nodes do represent the leaves of the tree, but only non-empty 



Listing 1: Pre-Coniputation of all empty sub-trees 

t empty.initCvoidX 
i = 0; 

Empty [i] = h() ; 
for (;i<159;i++) i 

Empty[i] = h( Empty [i - 1] . Empty [i -1] ) ; 
} 
return Empty ; 



Listing 2: Structures node and entry 

struct node t 

struct node *b [2] ; 
#define LEAF 1 
#define BRANCH 2 

int flag; 

struct entry .e; 

unsigned char* hash; 
>; 

struct entry { 



leaves have leaf nodes. The nodes contain data structures that allow dynamic 
computation of all node-values of the hash tree below their path. 

entry holds the information necessary to compute all nodes below the node 
pointing to it (see algorithm leaf node for details), path is a bit-string of Os 
and Is and describes the path to the actual leaf from the leaf node, i.e. the 
part of the path for which we don't store nodes, but generate them as needed, 
value contains the hash of the entry at the actual leaf. 

5.1 Computing Pairs along a Path 

The function rootpath defined in listing |31 on page [7| returns a list of pairs of 
hashes along a given path. The first two If statements handle the special cases 
that there is at most one entry in the sparse tree, rootpath calls the auxiliary 
functions nullnode, leafnode and branchnode. 

nullnode simply builds pairs of empty sub-tree roots in ascending order. 



Listing 3: rootpathO: Generating all pairs of hashes along path 

list rootpath(node root, bltstring path) { 
ifCroot — NULD-t 

/* empty sub -tree */ 

return nullnode (path) ; 
} 
If Croot->f lag — LEAFX 

/* exactly one leaf in subtree */ 

return leaf node (root , path); 
} 
if (root->f lag — BRAMCH)-t 

/* non-nnique path */ 

init (List); 

return branchnode (path , root, List); 
} 
} 



Listing 4: nullnodeO 



t nullnodeCbitstring path){ 
Inlt (List): 
forCi=0; i < length (pat h) ; i++){ 

push List, (Empty [i], Empty [i]); 
} 
return(Llst); 



Listing 5: singlepathO : Computing pairs of hashes in a sub-tree with exactly 
one given (path, value) 

list siiiglepath(bitstring path, char value[160]) { 
revpath = reverse (path); 
init (List): 
init (pair): 
i = i; 

bit = shift revpath; 
palrLbit] - value; 
pair [not bit] = Empty [0] ; 
push (List , pair); 
while(bit = shift revpath) { 

pairEbit] = h(List[0][0] . List[0][l]); 

pair [not bit] = Empty [i++]; 

push (List, pair); 
) 



leafnode calls singlepath to generate a temporary list of all non-empty 
sub-trees below the leaf node. It then compares the supplied path against the 
path to the leaf and selects the pairs from the temporary list for the maximum 
left-match of the paths. For the rest of the path, empty sub-tree roots are 
appended to the selected pairs. This list is returned. 

branchnode walks recursively down the tree as long as the path runs through 
BRANCH nodes. At each step it appends the hashes in the children of the 
current node to the list. At the point where the path changes to an undefined or 
LEAF node, branchnode calls nullnode or leafnode respectively, to complete 
the list. 

5.2 Inserting an Entry 

insert (algorithm|Hlon Dage ll8(l recursively walks down the tree along the given 
path, putting the traversed nodes on the stack. If the path runs into an empty 
subtree, a LEAF node with the given (path, value) is created. If the path runs 
through a LEAF node, the entry in the node is moved one step along its path 
down the tree, the node is converted to a BRANCH, and insert is called again 
on it. On the way up to the root, all the hashes on the path are adjusted. 

5.3 Deleting an Entry 

delete (algorithniElon Dagell9|l recursively walks down the tree until it reaches 
the LEAF node with the entry to delete. On the way up, it checks if the current 
node has only one child, which is a LEAF. If so, the entry in the leaf is attached 



Listing 6: leafnodeO: Generating pairs of hashes along a path below a leaf 
node 

list leafnode (struct node *node, bitstring path) -{ 
init (TmpList ) ; 
init (List); 

i=0; 

/* create list of non-empty nodes' hashes */ 

TmpList = singlepath(node->e->path , node ->e -> value ) ; 

/* compare given path against path of leaf */ 
Hhile(path[i] == node->e->path[i]) { 
append (List, TempList [i] ) ; 

> 

height = length(path) - i; 

/* if the two diverge , return empty nodes ' hashes */ 
while (height > 0) { 

pair = (Empty [height] , Empty [height] ) ; 

append (List , pair) ; 

} 

return List; 



Listing 7: branchnodeO: Generating pairs of hashes along a path below a 
branch node 



t branchnode (struct node *n, bitstring path) -{ 
init (pair); 

/* get hashes of children */ 
for(b = 0, 1){ 

if (n->b[b] != NULL)-t 

pairCb] = n->b [b] -> hash ; 
} else < 

pair[b] = Empty[length(path)] ; 
> 
> 

append (List , pair) ; 
bit = shift path; 

if (n->b[bit] != NULL && n->b [bit ] ->f lag == BRANCH){ 
/* Mialk down */ 

append (List, branchnode (path , n->b[bit], List)); 
> else -t 

if(n->n[bit] != NULL && n->b [bit ]-> flag == LEAF){ 
/* we hit a leaf */ 

append (List, leaf node (path , n->b[bit])); 
} else i 

/* walked into an emp ty sub - tree */ 
append (List , nullnode ( path) ) ; 
> 
> 



to the parent, and the child deleted. This is to handle situations where two leaves 
hang at the end of a stalk. If one of the leaves is removed, the stalk would be a 
series of roots of subtrees with exactly one entry. To keep storage minimal, this 
should be avoided, delete attaches the lonely leaf at the uppermost branch of 
the stalk. 

5.4 Properties 

We assume that the hash function h behaves as an ideal hash function would, 
an assumption often used in cryptography called the Random Oracle Model. It 
implies that the output of h is indistinguishable from the output of a random 
function. Under this assumption, our algorithms have the following properties: 

The sparse tree in memory is nearly balanced. If ft-'s output behaves as ran- 
domness, the paths in the tree will be random walks starting at the root. 
For a large number n of entries, — paths will lead through each node at 
the ith layer in the tree, and the average path length is logj (n) . 

Space for the sparse tree is bounded by twice the number of entries in the 
database. We store n entries at n leaves, and each pair of nodes has one 
parent-node. This means that 

nog2(")i 

maxnodes{'n) = > 2* 

i=0 

— 2ri°g2(")i+i _ \ 

< 2-n 



Pairs along paths to a non-existing key can be computed quickly. If the key is 
not in the table, then the path will enter an empty subtree after log2(n) 
steps, in the mean. For one million entries, for example, this means that 
after traversal of 20 nodes on average, nullnode will be called, and all 
that remains to be done is table lookups. 

Readers recognize non-existing keys after log2(n) steps from the root, in the 
mean. The pre-computation shown in ^ consists of 160 calls to the hash 
function. After this, a reader simply checks the returned pairs from the 
database against the pre-computed table Empty. 

Maximum message size per validation is 2-|ft,()|-|/i()| bits. Two pairs of |/i()| bit 
hashes per step lie on a path with \h()\ steps. For SHA-1 as h, this would 
mean 6400 bytes per validation. If stronger collision resistance is required, 
a cryptographic hash function with longer output may be chosen. While 
the resistance grows exponentially with the hash size, the messages grow 
only linearly. 
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The database can be distributed over several machines. The insert, delete 
and rootpath functions are independent of the actual height of the stored 
tree. As long as writers and readers know how to compute the 2*^+^ — 1 
hashes at the root of the tree, they can use d independent databases. This 
would also allow d concurrent writes, by locking at the branches. 

6 Related Work 

There are other schemes with different objectives related to the one just pre- 
sented. We will discuss the differences to our proposal. 

6.1 Undeniable Attesters 

Ahto Buldas et al. presented schemes for accountable certificate management 
in 2000 (l|6|) and (|7|)). In their scenario, the database is a part of a certificate 
authority (CA), i.e. the list of valid certificates. Their goal is to make sure that 
there is never an ambiguity about the state of a certificate. On request for a key 
k, the database sends an attester p = P{k, S), which is a cryptographic state- 
ment about the presence or absence of the key k in the table S. The database 
also returns a digest d — D(S) which is a summary of the database's table 
S. Any party can then call a verification algorithm V{k,d,p) which returns 
"Accept" if the statement p about k was correct for table summary d or else 
"Reject" . Buldas et al. call an attester undeniable if a CA can produce two con- 
tradicting attesters for the same key with only negligible probability. Formally, 
this is defined as follows: 

Let £A be the class of probabilistic algorithms of polynomial runtime. Let 
k be the security parameter (in our context the bitlength of the hash function's 
output). The attester A is given by the tuple of algorithms A = {Q,V,'D,VY. 
Its resilience against an attacker A of class £A is defined as 

UN^jl(A) =Vv[{x,d,p,p) <- Ail^) : V{x,d,p) ^ Accept A V{x,d,p) = Reject] . 

If k can be chosen as to minimize UN_4 || below any given e, then A is called 
undeniable. 

The paper examines different schemes for attesters and concludes with an 
efficient, undeniable attester based on search trees. 

6.1.1 Search Trees 

A search tree (see for example Knuth (S)) is a binary tree with the additional 
property that there is a comparison relation < and each node contains a value 
V for which the following holds: 

Vi < V < Vr, 



^G serves only to select a hash function for a given security parameter k. 
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where vi and Vr are the values in the left and right child of the node, respectively. 
If one or both children do not exist, the empty string is used instead. Buldas 
et al. add another field to each node, which holds the hash hy = h{hy^.v.hy^). 
The digest d — the root of the search tree — is signed by the CA and published 
through an untrusted Publication Authority (PA). The digest corresponds to 
our state credential and the PA to our announcement service. 

A proof p output by P for a key /c is a list p ~ (Vb, . . . , Vm) of values 
Vi = {hL,ki,hR), where ki is the value of a node and hL^ha are the hashes 
stored in the left andx. right child of the node, repectively, or the empty string 
if there is no such node. The values are selected such that kg = k ii k is in the 
database. If k is not in the database, then the tree contains two keys j and I 
such that I is the smallest value larger than k and j the largest value smaller 
than k. By construction of the tree, there is a path from the node of j to the 
root leading through I or vice versa. The key of j and I which is lower in the 
tree is used as ko in that case. If k is larger or smaller than all keys in the search 
tree, then I or j is set to the largest or smallest key in the tree, respectively. 

p together with the published digest d allows to prove the absence of a key. 
The verifier can check the hashes from fco to the root and verify the strict order 
of child nodes. If a key k is not in the tree, then p will contain j, I such that 
j < k < I A -<3ki € p : j < ki < k\/ k < ki < I. By the strict order and lack of 
right children of j and left children of I, k cannot be a node's value in the tree. 

The verifier V{x, d,p) returns "Error" if any of the following fails: 

• Checking the order of keys in p. 

• Computing and comparing the hashes along the path given in p up to the 
root. 



• 



Comparing d against the root hash in p. 



If X is the first key in p, V returns "Accept" , else "Reject" . 

6.1.2 Differences to our Proposal 

The attester states absence or presence in a list, exclusively. A state credential 
includes the value of the entry under the key, so that changes of an entry can 
be expressed. 

Buldas et al. assume that the database is reigned by a single entity which is 
trustworthy at the moment when an attester is issued, but may turn untrusted 
or unavailable later. That the database signs the digest itself establishes a 
different scenario than in our setting. 

To show that the database cheated, a user has to find another attester con- 
tradicting the attester he/she received. In our setting, a reader can prove that 
the database cheated immediately and without contacting other readers. 

6.1.3 Reduction of State Credentials to Attesters 

We can build undeniable attesters for keys from signed state credentials. Since 
values of entries are irrelevant for attesters, we will substitute the {key, value) 
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pairs in our algorithms by {key, key) pairs. Since the empty word might be an 
entry's key in the generahty of the proof, we wiU use the reserved key r instead 
in that case. The digest d is the state credential, but is supplied by the database 
instead of the announcement service. Our (P, D, V) are defined as follows: 

P{x, S) is the list of pairs of hashes along the path given by h{x) in the keyed 
hash tree generated from all entries in S. 

D{S) is the root of said keyed hash tree. 

V{x,d,p) outputs "Error" if any of the following fails: 

• Comparing the hash at the leaf corresponding to x against h{x) or 
h{). 

• Hashing the pairs of hashes up along the path h{x). 

• Comparing the resulting root hash against d. 

If the leaf value at the end of path h{x) is the empty hash, then V returns 
"Reject". If the leaf value is h{x), then "Accept". 

6.1.4 Proof of Undeniability 

Assume that some A e £A outputs {x,d,p,p) with probability e, such that 
V{x, d,p) = Accept and V{x, d,p) = Reject. Since V did not return "Error" , all 
the pairs of hashes in p and in p resolved up to d. This means that A produced a 
collision in the hash h with probability e, and the values xi,X2,yi,y2 '■ {xi .X2) ^ 
(2/1-2/2) : h{xi.X2) = h{yi.y2) are members of p and p. Thus we have reduced 
the security of our attester to the collision resistance of the hash h. 

6.1.5 Considerations about the underlying Data Structures 

Bulda's et al. use the search keys as they are, because the proofs of non- 
membership in the table rely on the greater-than relation between the keys. 
This has the disadvantage that the tree becomes unbalanced if the inserted 
entries are not uniformly distributed or if the first entry (the subsequent root) 
is not near the median of the set of entries. Unbalanced search trees cause longer 
searches, as more than log2(ri) nodes need to be traversed for some keys. The 
paper does not explain whether there is any re-balancing (see for example (|8|)) 
done on the search tree. Re-balancing this particular tree type would change 
all the hash-entries on the path from the previous root to the new one. Our 
sparse hash tree in memory is always balanced for large n. 

Using the keys as they are makes the attester's size unpredictable. With 
state credentials, the size is fixed (and quite small). 
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6.2 Zero-Knowledge Sets 

Unknown to this author, S. Micah, M. Rabin and J. Kihan presented a more 
general scheme, called Zero-Knowledge Sets (|9|) in 2003. Their goal was to 
prove set-membership non-interactively and that without leaking any other 
information about the set. The data structure underlying their scheme is almost 
identical to the keyed hash trees defined above. 

Micali et al. use a commitment scheme to bind the database to its previous 
statements about its contents. 

6.2.1 Pedersen's Commitment Scheme 

In commitment schemes, a prover P commits to a message m by making public 
a commitment string c and computing a proof r. Before P publishes m or 
r, nothing should be inducable about the message m from c. After P sends 
message rn, a verifier V can check whether this was the message committed to 
by c. For this, P supplies V with r. A commitment scheme is secure if P can 
produce contradicting proofs r, r' only with neglegible probability. 

Pedersen's scheme requires four public parameters, i.e., two primes p, q such 
that q\p — 1, and two generators g,h, where g and h generate G C F* the 
subgroup of order q in ¥*. P commits to m by publishing c = g"^h^ mod p. 
r is the corresponding proof, a randomly chosen value < q. To verify the 
commitment, V re-computes c for herself and compares. 

To produce two proofs r, r' for differing messages m, m! , P would have to find 
r, r' such that g^K^ — 5™ K^ mod p. Assume she succeeded. She then could 
compute 5™-™' = /I'-'-'-modp and from this [g^-^'y'-r = (j^r'-ry-r' ^ 
hvaodp and thus get logg{h)modp, breaking the discrete logarithm mod p. It 
follows that the security of the Pedersen commitment is reducible to the discrete 
log problem. It also follows that if logg{h) is known to the prover, she can fake 
arbitary commitments. 

Micali et al. use this scheme to construct a hash function H : {0, 1}* -^ 
{0,1}'^ where k is the bitlength of prime p. They use this hash function to 
process a data-structure very much like our keyed hash tree from sectional 

Their prover builds the tree by first inserting all leaves derived from the 
database's entries, such that the value in the leaf is H{D{x)), where D{x) is the 
value stored in the database under key x, and the leaf's position in the tree is 
determined by the path described by H{x). The prover adds all nodes on those 
paths to the root, their values remain undefined until later. She adds those 
nodes whose parents are now in the tree, but does not repeat this recursively. 
These nodes correspond to the empty subtree's roots in our construction. 

In the next step, the prover generates a random exponent e^ for every node 
V in the structure, and stores hy = ft,*^" in it if w is a non-empty leaf or the 
ancestor of one, and h^ — g'^^ for the empty leaves. Thus the prover knows 
logghy for these nodes, a property exploited later. 
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The prover now computes commitments for the leaves by computing c^ = 
girivf^rv ^ where niv = H{D{x)) or 0, hy the value stored in the last step, and r^ 
a random value chosen per leaf. 

In the last step, the internal nodes of the tree get their commitment values 
generated. The process runs from the leaves upwards as in Merkle's scheme. 
Each internal node u is associated a value ?n„ = -ff(cuO: ^uOj c„i, /i„i), where 
uO, m1 are the left and right children of node u. From this and the value /i„ 
stored ahead in u, c„ is computed as g"^"h^" for a randomly chosen r„ G F*. 
The commitment value for the whole database is the commitment of the tree's 
root Co- This value is published before any queries are answered. 

To prove that the value stored under key a: is y in the database, the prover 
provides the tuples {niv , Cy , hy , Vy) for every node on the path through the tree 
given by H{x), together with the values Cyo, hyQ, c^i, hyi for v's siblings. 

To check this proof, verifier V 

1. compares the leaf's my against H{y). 

2. checks recursively that to„ = iJ(c„o, hyo, c^i, hyi) 

3. checks for every v on the path that hy = /i*^" and Cy = g"^^hy^ . 

For a proof of the non-existance of key x in the database, the prover provides 
all the nodes' values along the path given by H{x) as long as these nodes are 
in the generated tree. The last of these nodes, m, will have iriu = 0. If P would 
supply this to V , then the verifier would learn that there are no non-empty 
nodes in this sub-tree. To disguise this fact, P generates a branch of non- 
empty nodes down to a virtual, empty leaf (the computation goes bottom-up 
however). Node u becomes the branch's root for this proof, the value to„ = is 
substituted by the value in the uppermost node of the freshly generated branch. 

The prover can do this convincingly because she can re-commit to any 
(mmTu) in node u. This is caused by the special construction of /i„ for empty 
leaves, where P knows the value of logg{hu) by design. This allows to "glue" 
the generated branch of nodes, complete with their commitments, to node u. 

The verification runs as above, except that V checks for m[jix) = 0. 

6.2.2 Differences to our proposal 

In our proposal, the writers are different entities from the database, but they 
compute the committment string. They are continuously updating and chang- 
ing entries. In Micali et al.'s terminology, our writers commit to the history of 
their changes to the database, while the database later proves that it did reply 
in accordance to it state after the latest update. Zero-knowledge was no re- 
quirement in our design. Since the motivations and requirements are different, 
the mechanisms behind Micali et al.'s scheme and ours differ in several points: 

Multiple, mutually trusting authors in our design, multiple database au- 
thors (writers) were a prerequisit. It is not obvious, how zero knowledge 
sets could be adapted for multiple writers. Two writers would have to 
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communicate at least the secret e^ , r^ for each node to be able to insert 
entries. This would require a secure authenticated channel between all 
writers. 

Zero knowledge the zero-knowledge requirement is violated in our scheme, 
because a proof about any key will leak information about the existence 
or non-existence of other keys. 

Database modification one of our requirements was that the database can 
be modified by the writers who can re-compute the credential/attester/- 
commitnicnt for the database. The zero-knowledge property of Micali et 
al.'s scheme is lost if the database is allowed to change/add/rcmove en- 
tries. This is because the commitment values change along the path to a 
modified node and thus give away information about the minimum num- 
ber of modified entries. This makes their construction extremely static 
and not suitable for our application. Micali et al. mention this in their 
open problems section. 

Speed because Micali et al.'s scheme must satisfy additional constraints, it is 
much slower (five modular exponentations per node versus one application 
of an optimized hash function) . 

7 Summary 

We have presented an efficient mechanism that allows readers to verify the 
replies of a database with the help of the writers. To achieve this, we introduced 
a new cryptographic primitive, an extension of Merkle's hash trees. A verifica- 
tion protocol between reader and database validates a reply with a low amount 
of data traffic. This extends even to those replies where the database denies 
the existence of the requested entry. The mechanism thus allows a database to 
prove that it does not know about such an entry. We examined the relationship 
to work in the area of certificate management, and showed how our proposal 
can be applied there as well. In comparison to the more general Zero-knowledge 
sets by Micali et al., our scheme has the advantage that it is faster and allows 
subsequent modifications by multiple writers. 
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Listing 8: insert () : Insertion of a new entry 



uct node * insert (bit string path, char value [160] , struct node ♦node) < 
if (node == NULL ){ 

/* virgin sub-tree: create new leaf */ 

init (List); 

init (newnode); 

init (entry); 

newnode ->f lag = LEAF; 

List = leafnode(path , value); 

entry ->path = path; 

newnode ->e = entry; 

newnode ->hash = h(List[0][0] . List[0][l]); 

free(List) ; 

return(newnode); 
> 
if (node->f lag == BRANCH)-t 

init (newnode); 

bit = shift path; 

/* walk down */ 

newnode = insert(path, value, node ->b [bit ]) ; 

/* update the hash value on the way back */ 

init (pair); 

pair [bit ] = newnode ->hash; 

if (node->b[not bit] == HULL) -[ 

pair[not bit] = Empty [length ( path) ] ; 

> else -t 

pair[not bit] = node->b[not bit]->hash; 

> 

node->hash = h(pair[0] . pair[l]); 

return(node); 
> 
if (node->f lag == LEAF){ 

/* change LEAF to BRANCH */ 

init (tmppath) ; 

init (tmpval); 

tmppath = node->e->path; 

tmpval = node->e->value; 

node->flag = BRANCH; 

/* move leaf's content down along its own path */ 

node->entry = NULL; 

node = insert (tmppath , tmpval , node ) ; 

/* insert new (path, value) in the now BRANCH */ 

bit = shift path; 

init (newnode); 

newnode = insert(path, value, node ->b [bit ]) ; 

/* update hash on the way back */ 

node->b[bit] = newnode; 

init (pair); 

pair [bit ] = newnode ->hash; 

if (node->b[not bit] == MULL) -[ 

pair[not bit] = Empty [length ( path) ] ; 

> else -t 

pair[not bit] = node->b[not bit]->hash; 
> 

node->hash = h(pair[0] . pair[l]); 
return(node); 
> 
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Listing 9: delete (): Deleting the entry at the end of a given path 

uct node * delete (bit string path, struct node *node)-[ 
if (node->f lag == LEAF) t 

/* remove LEAF, stop recursion */ 

free (node -> entry ) ; 

free (node); 

return NULL; 
> 

/* we're on the way to the leaf still */ 
init (pair); 
/* remember the bit */ 
bit = shift path; 

/* walk down */ 

node ->b [bit] = delete (path , node ->b [bit ] ) ; 

/* reconstruct the path */ 
path = prepend bit , path; 

/* how many non-children ?*/ 
for (i = 0, 1) { 

if (node->b[i] == NULL) { 
emp = i ; 

> 
> 
if(niimemp == 2)t 

/* node was a stalk with a single leaf, delete */ 
free (node); 
return NULL; 
> 
if(niimemp == 1){ 

if (node->b[not eitip]->flag == LEAF)t 

/* one child , a leaf , move it up */ 

node->e = node->b[not emp]->e; 

node->e->path = node ->e->path ; 

/* correct the path */ 

pop node->e->path; 

node->flag = LEAF; 

pairCemp] = Empty [length (path) ] ; 

pair[not emp] = node->b[not emp]->hash; 

node->hash = h(pair[0] . pair[l]); 

free (node->b[not emp]->e); 

free (node->btnot emp]); 

return node ; 
} else < 

/* branch node leading to at least two leafs */ 

/* update the hash */ 

pair[emp] = Empty [length (path) ] ; 

pair[not emp] = node->b[not emp]->hash; 

node->hash = h(pair[0] . pair[l]); 

return node; 
> 
> 
if (numemp == 0) { 

/* branch node with at least two leafs below */ 
/* update the hash */ 
pair[0] = node->b [0] ->hash; 
pair[l] = node->b [1] ->hash; 
node->hash = h(pair[0] . pair[l]); 
return node; 
> 
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